Welcome to the Notebook

Importing modules

Task 1

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt 
print('modules are imported')
modules are imported

Task 1.1:

Loading the Dataset

In [2]:
dataset_url = 'https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv'
df = pd.read_csv(dataset_url)

Task 1.2:

let's check the dataframe

In [3]:
df.head()
Out[3]:
Date Country Confirmed Recovered Deaths
0 2020-01-22 Afghanistan 0 0.0 0
1 2020-01-23 Afghanistan 0 0.0 0
2 2020-01-24 Afghanistan 0 0.0 0
3 2020-01-25 Afghanistan 0 0.0 0
4 2020-01-26 Afghanistan 0 0.0 0
In [4]:
df.tail()
Out[4]:
Date Country Confirmed Recovered Deaths
78907 2021-03-03 Zimbabwe 36179 33392.0 1478
78908 2021-03-04 Zimbabwe 36223 33632.0 1483
78909 2021-03-05 Zimbabwe 36248 33759.0 1484
78910 2021-03-06 Zimbabwe 36260 33805.0 1485
78911 2021-03-07 Zimbabwe 36271 33834.0 1485

let's check the shape of the dataframe

In [5]:
df.shape
Out[5]:
(78912, 5)

Task 2.1 :

let's do some preprocessing

In [6]:
df = df[df.Confirmed > 0]
In [7]:
df.head()
Out[7]:
Date Country Confirmed Recovered Deaths
33 2020-02-24 Afghanistan 1 0.0 0
34 2020-02-25 Afghanistan 1 0.0 0
35 2020-02-26 Afghanistan 1 0.0 0
36 2020-02-27 Afghanistan 1 0.0 0
37 2020-02-28 Afghanistan 1 0.0 0
In [8]:
df[df.Country == 'Italy']
Out[8]:
Date Country Confirmed Recovered Deaths
34944 2020-01-31 Italy 2 0.0 0
34945 2020-02-01 Italy 2 0.0 0
34946 2020-02-02 Italy 2 0.0 0
34947 2020-02-03 Italy 2 0.0 0
34948 2020-02-04 Italy 2 0.0 0
... ... ... ... ... ...
35341 2021-03-03 Italy 2976274 2440218.0 98635
35342 2021-03-04 Italy 2999119 2453706.0 98974
35343 2021-03-05 Italy 3023129 2467388.0 99271
35344 2021-03-06 Italy 3046762 2481372.0 99578
35345 2021-03-07 Italy 3067486 2494839.0 99785

402 rows × 5 columns

let's see Global spread of Covid19

In [9]:
fig = px.choropleth(df, locations = 'Country', locationmode='country names', color='Confirmed'
                   ,animation_frame='Date')
fig.update_layout(title_text = 'Global Spread of COVID19')
fig.show()

Task 2.2 : Exercise

Let's see Global deaths of Covid19

In [10]:
fig = px.choropleth(df, locations = 'Country', locationmode='country names', color='Deaths'
                   ,animation_frame='Date')
fig.update_layout(title_text = 'Death Rate of COVID19')
fig.show()

Task 3.1:

Let's Visualize how intensive the Covid19 Transmission has been in each of the country

let's start with an example:

In [11]:
df_china = df[df.Country == 'China']
df_china.head()
Out[11]:
Date Country Confirmed Recovered Deaths
14796 2020-01-22 China 548 28.0 17
14797 2020-01-23 China 643 30.0 18
14798 2020-01-24 China 920 36.0 26
14799 2020-01-25 China 1406 39.0 42
14800 2020-01-26 China 2075 49.0 56

let's select the columns that we need

In [12]:
df_china = df_china[['Date', 'Confirmed']]
In [13]:
df_china.head()
Out[13]:
Date Confirmed
14796 2020-01-22 548
14797 2020-01-23 643
14798 2020-01-24 920
14799 2020-01-25 1406
14800 2020-01-26 2075

calculating the first derivation of confrimed column

In [14]:
df_china['Infection Rate'] = df_china['Confirmed'].diff()
In [15]:
df_china.head()
Out[15]:
Date Confirmed Infection Rate
14796 2020-01-22 548 NaN
14797 2020-01-23 643 95.0
14798 2020-01-24 920 277.0
14799 2020-01-25 1406 486.0
14800 2020-01-26 2075 669.0
In [16]:
px.line(df_china, x = 'Date', y = ['Confirmed', 'Infection Rate'])
In [17]:
df_china['Infection Rate'].max()
Out[17]:
15136.0

Task 3.2:

Let's Calculate Maximum infection rate for all of the countries

In [18]:
df.head()
Out[18]:
Date Country Confirmed Recovered Deaths
33 2020-02-24 Afghanistan 1 0.0 0
34 2020-02-25 Afghanistan 1 0.0 0
35 2020-02-26 Afghanistan 1 0.0 0
36 2020-02-27 Afghanistan 1 0.0 0
37 2020-02-28 Afghanistan 1 0.0 0
In [19]:
countries = list(df['Country'].unique())
max_infection_rates = []
for c in countries :
    MIR = df[df.Country == c].Confirmed.diff().max()
    max_infection_rates.append(MIR)
   

Task 3.3:

let's create a new Dataframe

In [20]:
df_MIR = pd.DataFrame()
df_MIR['Country'] = countries
df_MIR['Max Infection Rate'] = max_infection_rates
df_MIR.head()
Out[20]:
Country Max Infection Rate
0 Afghanistan 1485.0
1 Albania 1239.0
2 Algeria 1133.0
3 Andorra 299.0
4 Angola 355.0

Let's plot the barchart : maximum infection rate of each country

In [21]:
px.bar(df_MIR, x= 'Country', y='Max Infection Rate', 
       color='Country', 
       title='Global Maximum Infection Rate')

Task 4: Let's See how National Lockdowns Impacts Covid19 transmission in Italy

COVID19 pandemic lockdown in Italy

On 9 March 2020, the government of Italy under Prime Minister Giuseppe Conte imposed a national quarantine, restricting the movement of the population except for necessity, work, and health circumstances, in response to the growing pandemic of COVID-19 in the country. source

In [22]:
italy_lockdown_start_date = '2020-03-09'
italy_lockdown_a_month_later = '2020-04-09'
In [23]:
df.head()
Out[23]:
Date Country Confirmed Recovered Deaths
33 2020-02-24 Afghanistan 1 0.0 0
34 2020-02-25 Afghanistan 1 0.0 0
35 2020-02-26 Afghanistan 1 0.0 0
36 2020-02-27 Afghanistan 1 0.0 0
37 2020-02-28 Afghanistan 1 0.0 0

let's get data related to italy

In [24]:
df_italy = df[df.Country == 'Italy']

lets check the dataframe

In [25]:
df_italy.head()
Out[25]:
Date Country Confirmed Recovered Deaths
34944 2020-01-31 Italy 2 0.0 0
34945 2020-02-01 Italy 2 0.0 0
34946 2020-02-02 Italy 2 0.0 0
34947 2020-02-03 Italy 2 0.0 0
34948 2020-02-04 Italy 2 0.0 0

let's calculate the infection rate in Italy

In [26]:
df_italy['Infection Rate'] = df_italy.Confirmed.diff()
df_italy.head()
C:\Users\Vanshika Singh\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Out[26]:
Date Country Confirmed Recovered Deaths Infection Rate
34944 2020-01-31 Italy 2 0.0 0 NaN
34945 2020-02-01 Italy 2 0.0 0 0.0
34946 2020-02-02 Italy 2 0.0 0 0.0
34947 2020-02-03 Italy 2 0.0 0 0.0
34948 2020-02-04 Italy 2 0.0 0 0.0

ok! now let's do the visualization

In [27]:
fig = px.line(df_italy, x= 'Date', y='Infection Rate', title= "Before and After Lockdown Italy")
fig.add_shape(
    dict(
    type= "line",
    x0= italy_lockdown_start_date,
    y0= 0,
    x1= italy_lockdown_start_date,
    y1= df_italy['Infection Rate'].max(),
    line = dict(color= 'red', width =2)
    )
)
fig.add_annotation(
    dict(
    x= italy_lockdown_start_date,
    y= df_italy['Infection Rate'].max(),
    text = 'sharing data of the lockdown'
    )
)
fig.add_shape(
    dict(
    type= "line",
    x0= italy_lockdown_a_month_later,
    y0= 0,
    x1= italy_lockdown_a_month_later,
    y1= df_italy['Infection Rate'].max(),
    line = dict(color= 'yellow', width =2)
    )
)
fig.add_annotation(
    dict(
    x= italy_lockdown_a_month_later,
    y= 0,
    text = 'a month later'
    )
)

Task 5: Let's See how National Lockdowns Impacts Covid19 active cases in Italy

In [28]:
df_italy.head()
Out[28]:
Date Country Confirmed Recovered Deaths Infection Rate
34944 2020-01-31 Italy 2 0.0 0 NaN
34945 2020-02-01 Italy 2 0.0 0 0.0
34946 2020-02-02 Italy 2 0.0 0 0.0
34947 2020-02-03 Italy 2 0.0 0 0.0
34948 2020-02-04 Italy 2 0.0 0 0.0

let's calculate number of active cases day by day

In [29]:
df_italy['Deaths Rate'] = df_italy.Deaths.diff()
C:\Users\Vanshika Singh\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

let's check the dataframe again

In [30]:
df_italy.head()
Out[30]:
Date Country Confirmed Recovered Deaths Infection Rate Deaths Rate
34944 2020-01-31 Italy 2 0.0 0 NaN NaN
34945 2020-02-01 Italy 2 0.0 0 0.0 0.0
34946 2020-02-02 Italy 2 0.0 0 0.0 0.0
34947 2020-02-03 Italy 2 0.0 0 0.0 0.0
34948 2020-02-04 Italy 2 0.0 0 0.0 0.0

now let's plot a line chart to compare COVID19 national lockdowns impacts on spread of the virus and number of active cases

In [31]:
fig = px.line(df_italy,x='Date',y=['Infection Rate','Deaths Rate'])
fig.show()

let's normalize the columns

In [32]:
df_italy['Infection Rate'] = df_italy['Infection Rate']/df_italy['Infection Rate'].max()
df_italy['Deaths Rate'] = df_italy['Deaths Rate']/df_italy['Deaths Rate'].max()
C:\Users\Vanshika Singh\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Users\Vanshika Singh\anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

let's plot the line chart again

In [33]:
fig = px.line(df_italy, x='Date', y=['Infection Rate', 'Deaths Rate'])
fig.add_shape(
    dict(
    type= "line",
    x0= italy_lockdown_start_date,
    y0= 0,
    x1= italy_lockdown_start_date,
    y1= df_italy['Infection Rate'].max(),
    line = dict(color= 'green', width =2)
    )
)
fig.add_annotation(
    dict(
    x= italy_lockdown_start_date,
    y= df_italy['Infection Rate'].max(),
    text = 'sharing data of the lockdown'
    )
)
fig.add_shape(
    dict(
    type= "line",
    x0= italy_lockdown_a_month_later,
    y0= 0,
    x1= italy_lockdown_a_month_later,
    y1= df_italy['Infection Rate'].max(),
    line = dict(color= 'orange', width =2)
    )
)
fig.add_annotation(
    dict(
    x= italy_lockdown_a_month_later,
    y= 0,
    text = 'a month later'
    )
)

COVID19 pandemic lockdown in Germany

Lockdown was started in Freiburg, Baden-Württemberg and Bavaria on 20 March 2020. Three days later, it was expanded to the whole of Germany

In [34]:
Germany_lockdown_start_date = '2020-03-23' 
Germany_lockdown_a_month_later = '2020-04-23'

let's select the data related to Germany

In [35]:
df_germany = df[df.Country == 'Germany']

let's check the dataframe

In [36]:
df_germany.head()
Out[36]:
Date Country Confirmed Recovered Deaths
27131 2020-01-27 Germany 1 0.0 0
27132 2020-01-28 Germany 4 0.0 0
27133 2020-01-29 Germany 4 0.0 0
27134 2020-01-30 Germany 4 0.0 0
27135 2020-01-31 Germany 5 0.0 0

let's calculate the infection rate and deaths rate in Germany

In [37]:
df_germany['Infection Rate'] = df_germany.Confirmed.diff()
df_germany['Deaths Rate'] = df_germany.Deaths.diff()
C:\Users\Vanshika Singh\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Users\Vanshika Singh\anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

let's check the dataframe

In [38]:
df_germany.head()
Out[38]:
Date Country Confirmed Recovered Deaths Infection Rate Deaths Rate
27131 2020-01-27 Germany 1 0.0 0 NaN NaN
27132 2020-01-28 Germany 4 0.0 0 3.0 0.0
27133 2020-01-29 Germany 4 0.0 0 0.0 0.0
27134 2020-01-30 Germany 4 0.0 0 0.0 0.0
27135 2020-01-31 Germany 5 0.0 0 1.0 0.0

now let's plot the line chart

In [39]:
df_germany['Infection Rate'] = df_germany['Infection Rate']/df_germany['Infection Rate'].max()
df_germany['Deaths Rate'] = df_germany['Deaths Rate']/df_germany['Deaths Rate'].max()
C:\Users\Vanshika Singh\anaconda3\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

C:\Users\Vanshika Singh\anaconda3\lib\site-packages\ipykernel_launcher.py:2: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

In [40]:
fig = px.line(df_germany, x='Date', y=['Infection Rate', 'Deaths Rate'])
fig.add_shape(
    dict(
    type= "line",
    x0= Germany_lockdown_start_date,
    y0= 0,
    x1= Germany_lockdown_start_date,
    y1= df_germany['Infection Rate'].max(),
    line = dict(color= 'black', width =2)
    )
)
fig.add_annotation(
    dict(
    x= Germany_lockdown_start_date,
    y= df_germany['Infection Rate'].max(),
    text = 'sharing data of the lockdown'
    )
)
fig.add_shape(
    dict(
    type= "line",
    x0= Germany_lockdown_a_month_later,
    y0= 0,
    x1= Germany_lockdown_a_month_later,
    y1= df_germany['Infection Rate'].max(),
    line = dict(color= 'green', width =2)
    )
)
fig.add_annotation(
    dict(
    x= Germany_lockdown_a_month_later,
    y= 0,
    text = 'a month later'
    )
)
In [ ]: